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DETAILED ACTION 

Continued Examination Under 37 CFR 1.114 

1 . A request for continued examination under 37 CFR 1.114, including the fee set 
forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this 
application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1 .17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 
12/19/2008 has been entered. 

Response to Arguments 

2. Applicants arguments with respect to claims 1-35 have been considered but are 
moot in view of the new grounds of rejection. Examiner has withdrawn Pentheroudakis 
et al. US 7092871 B2 (hereinafter Pentheroudakis) and Bai et al. US 6311152 B1 
(hereinafter Bai), and has incorporated Deligne et al. US 6314399 B1 (hereinafter 
Deligne) and Millett et al. US 6584458 B1 (hereinafter Millett), wherein Examiner 
believes that the scope of the claims is now pertinent to Deligne in view of Millett. 
Additionally, Examiner has withdrawn Bakis et al. US 6023673 A (hereinafter Bakis) and 
incorporated Hwang et al. US 20020082831 A1 (hereinafter Hwang). 

Claim Rejections - 35 USC § 101 

3. 35 U.S.C. 101 reads as follows: 
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Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

Claims 26-29 and 34 are rejected under 35 U.S.C. 101 because: 
Claims 26-29 and 34 do not fall within one of the four statutory categories of 
invention. Supreme Court precedent 1 and recent Federal Circuit decisions 2 indicate 
that a statutory "process" under 35 U.S.C. 101 must (1) be tied to another statutory 
category (such as a particular apparatus), or (2) transform underlying subject matter 
(such as an article or material) to a different state or thing. While the instant claim(s) 
recite a series of steps or acts to be performed, the claim(s) neither transform 
underlying subject matter nor positively tie to another statutory category that 
accomplishes the claimed method steps, and therefore do not qualify as a statutory 
process. 

Claims 26-29 and 34 recite purely mental steps and would not qualify as a 
statutory process. In order to qualify as a statutory process, the method claim should 
positively recite the other statutory class to which it is tied (i.e. apparatus, device, 
product, etc.). For example, the method steps of claim 26-29 and 34 appear to recite 
mental steps such as "a speech recognition method for recognizing speech" and do not 
identify an apparatus that performs the recited method steps, such as the speech 
recognition apparatus/computer as described in the specification (present invention 
page 14). 

1 Diamond v. Diehr, 450 U.S. 175, 184 (1981); Parker v. Flook, 437 U.S. 584, 588 n.9 (1978); Gottschalk 
v. Benson, 409 U.S. 63, 70 (1972); Cochrane v. Deener, 94 U.S. 780, 787-88 (1876). 

2 In re Bilski, 88 USPQ2d 1385 (Fed. Cir. 2008). 
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Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 1-5, 7, 9, 13-18, 23 and 26-35 rejected under 35 U.S.C. 103(a) as being 
unpatentable over Rigazio et al. US 6182039 B1 (hereinafter Rigazio) in view of Deligne 
et al. US 6314399 B1 (hereinafter Deligne) and further in view of Millett et al. US 
6584458 B1 (hereinafter Millett). 

Re claims 1 , 13, 14, and 26-30, Rigazio teaches language model generation and 
accumulation apparatus that generates and accumulates language models for speech 
recognition, the apparatus comprising: 

a lower-level N-gram language model (Col. 6 lines 1 1-20) generation and 
accumulation unit operable to generate and accumulate a lower-level N-gram language 
model that is obtained by modeling (Col. 4 lines 30-55 & Fig. 2) a sequence of two or 
more words within the word string class; 

However, Rigazio fails to teach a word string class and a plurality of text as a 
second sequence of words that includes the word string class 

a higher-level N-gram language model generation and accumulation unit 
operable to generate and accumulate a higher-lever N-gram language model that is 
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obtained by modeling each of a plurality of texts as a sequence of words that includes a 
word string class indicating a linguistic property of a word string constituting two or more 
words. 

Deligne teaches well known limitations of previous technology, wherein Deligne 
teaches class versions of phrase based models can be defined in a way similar to the 
way class version of N-gram models are defined, i.e., by assigning class labels to the 
phrases. In prior art it consists in first assigning word class labels to the words, and in 
then defining a phrase class label for each distinct phrase of word class labels. A 
drawback of this approach is that only phrases of the same length can be assigned the 
same class label. For example, the phrases "thank you" and "thank you very much" 
cannot be assigned the same class label, because being of different lengths, they will 
lead to different sequences of word class labels (Deligne Col. 2 lines 10-20). 

Further, Deligne improves these limitations by teaching the clustering 
(classification process) of the variable-length phrases is explained. Recently, class- 
phrase based models have gained some attention, but usually like in Prior Art 
Reference 1 , it assumes a previous clustering of the words. Typically, each word is first 
assigned a word-class label C.sub.k, then variable-length phrases, wherein the phrases 
"thank you for" and "thank you very much for" cannot be assigned the same class label. 
In the present preferred embodiment, it is proposed to address this limitation by directly 
clustering phrases instead of words (Deligne Col. 10 lines 43-60) 

Furthermore, Deligne teaches the step ensures that the class assignment based 
on the mutual information criterion is optimal with respect to the current phrase 
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distribution, and the step SS2 ensures that the bigram distribution of the phrases 
optimizes the likelihood calculated according to Equation (19) with the current class 
distribution. The training data are thus iteratively structured at a both paradigmatic and 
syntagmatic level in a fully integrated way (the terms paradigmatic and syntagmatic are 
both linguistic terms). That is, the paradigmatic relations between the phrases 
expressed by the class assignment influence the reestimation of the bigram distribution 
of the phrases, while the bigram distribution of the phrases determines the subsequent 
class assignment (Deligne Col. 11 lines 29-43). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate a word string class 
and a plurality of text as a second sequence of words that includes the word string class 
and a higher-level N-gram language model generation and accumulation unit operable 
to generate and accumulate a higher-lever N-gram language model that is obtained by 
modeling each of a plurality of texts as a sequence of words that includes a word string 
class indicating a linguistic property of a word string constituting two or more words as 
taught by Deligne to allow for optimal class assignment to account for sentence and 
word based modeling in speech recognition (Deligne Col. 10 lines 43-60). 

However, Rigazio in view of Deligne fails to teach a word string class that further 
includes a virtual word denoting a beginning of the word string class and a virtual word 
denoting and end of the word string class. 
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Millett teaches that it is necessary to increase the current 'virtual word number' 
up to the beginning of the next word cluster boundary whenever the end of a data item 
is reached in the word stream. The only way to know this is to place a marker in the 
word stream signaling the end of a data item. For non-word level indexes, granules 
naturally fall within data items, so there is not a problem with a row in the granule cross 
reference table referring to more than one data item (Millett Col. 1 1 lines 19-29). 

Further, Millett teaches that granule boundary markers 58 are used to demarcate 
the beginning and end of granules (e.g., "<MB>" for the beginning of a granule and 
"<ME>" for the end of a granule 60), as shown in FIG. 2. As used herein, the term 
"granule" and its derivatives refers to a predetermined set of text, or an indexing unit. 
The granule size determines the degree to which the location of a word within a 
document can be determined. For example, a document level granularity would be able 
to identify the document in which a word appears but not the page or paragraph. A 
paragraph level granularity would-be able to more precisely identify the paragraph 
within a document where a word appears, while a word level granularity would be able 
to identify the sequential word location of a word (e.g., the first word of the document, 
the second word of the document, etc.). As the granularity increases and approaches 
word level granularity, the size and complexity of an index increases, but word locations 
can be more precisely defined. The purpose of the word stream 44 is to track the 
granules in which a word occurs, not the total number of occurrences of the word. 
(Millett Col. 4 line 50 - Col. 5 line 15). 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio in view of Deligne to incorporate a 
word string class that further includes a virtual word denoting a beginning of the word 
string class and a virtual word denoting and end of the word string class as taught by 
Millett to allow for the identification of varying text (i.e. phrases or words or paragraphs), 
wherein markers (i.e. virtual words) are used to tag the beginning and end of the 
specified granule (i.e. phrases, words, paragraphs, etc.) (Millett Col. 4 line 50 - Col. 5 
line 15). 

Re claims 2 and 15, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 1, wherein the higher-level N-gram 
language model (Col. 6 lines 11-20) generation and accumulation unit and the lower- 
level N-gram language model generation and accumulation unit generate the respective 
language models (Col. 4 lines 4-55 & Fig. 2), using different corpuses (Col. 7 line 21 - 
Col. 8 line 19). 

However, Ragazio fails to teach the higher-level N-gram language model of 
claim 1. 

Deligne teaches well known limitations of previous technology, wherein Deligne 
teaches class versions of phrase based models can be defined in a way similar to the 
way class version of N-gram models are defined, i.e., by assigning class labels to the 
phrases. In prior art it consists in first assigning word class labels to the words, and in 
then defining a phrase class label for each distinct phrase of word class labels. A 
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drawback of this approach is that only phrases of the same length can be assigned the 
same class label. For example, the phrases "thank you" and "thank you very much" 
cannot be assigned the same class label, because being of different lengths, they will 
lead to different sequences of word class labels (Deligne Col. 2 lines 10-20). 

Further, Deligne improves these limitations by teaching the clustering 
(classification process) of the variable-length phrases is explained. Recently, class- 
phrase based models have gained some attention, but usually like in Prior Art 
Reference 1 , it assumes a previous clustering of the words. Typically, each word is first 
assigned a word-class label C.sub.k, then variable-length phrases, wherein the phrases 
"thank you for" and "thank you very much for" cannot be assigned the same class label. 
In the present preferred embodiment, it is proposed to address this limitation by directly 
clustering phrases instead of words (Deligne Col. 10 lines 43-60) 

Furthermore, Deligne teaches the step ensures that the class assignment based 
on the mutual information criterion is optimal with respect to the current phrase 
distribution, and the step SS2 ensures that the bigram distribution of the phrases 
optimizes the likelihood calculated according to Equation (19) with the current class 
distribution. The training data are thus iteratively structured at a both paradigmatic and 
syntagmatic level in a fully integrated way (the terms paradigmatic and syntagmatic are 
both linguistic terms). That is, the paradigmatic relations between the phrases 
expressed by the class assignment influence the reestimation of the bigram distribution 
of the phrases, while the bigram distribution of the phrases determines the subsequent 
class assignment (Deligne Col. 11 lines 29-43). 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate higher level 
language modeling as taught by Deligne to allow for optimal class assignment to 
account for sentence and word based modeling in speech recognition (Deligne Col. 10 
lines 43-60). 

Re claims 3 and 16, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 2, wherein the lower-level N-gram language 
model (Col. 6 lines 1 1-20) generation and accumulation unit includes a corpus update 
unit operable to update the corpus (Col. 12 lines 23-41) for the lower-level N-gram 
language model (Col. 4 lines 4-55 & Fig. 2), 

the lower-level N-gram language model generation and accumulation unit 
updates the lower-level N-gram language model based on the updated corpus (Col. 12 
lines 23-41), and generates the updated lower-level N-gram language model (Col. 4 
lines 4-55 & Fig. 2). 

Re claims 4 and 17, language model generation and accumulation apparatus 
according to Claim 1, wherein the lower-level N-gram language model (Col. 6 lines 11- 
20) generation and accumulation unit analyzes the first sequence of words (Col. 4 lines 
4-55 & Fig. 2), and generates the lower-level N-gram language model by modeling each 
sequence of the one or more morphemes based on the word string class (Col. 4 lines 4- 
55 & Fig. 2). 
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However, Rigazio fails to teach analyzing the first sequence of words within the 
word string class into one or more morphemes that are the smallest language units 
having meanings. 

Deligne teaches that the N-gram class model is defined as a language model 
that approximates a word N-gram in combinations of occurrence distributions of word- 
class N-grams and class-based words as shown by the following equation (this equation 
becomes equivalent to an HMM equation in morphological or morphemic analysis if 
word classes are replaced by parts of speech (Deligne Col. 18 lines 1-16). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate analyzing the first 
sequence of words within the word string class into one or more morphemes that are 
the smallest language units having meanings as taught by Deligne to allow for a 
multidimensional probabilistic method of prediction used to classify and model speech, 
wherein analysis can be performed on the smallest text units (i.e. morphemes) (Deligne 
Col. 18 lines 1-16). 

Re claims 5 and 18, language model generation and accumulation apparatus 
according to Claim 1, wherein the higher-level N-gram language model (Col. 6 lines 11- 
20) generation and accumulation unit, and then generates the higher-level N-gram 
language model by modeling (Col. 4 lines 30-55 $ Fig. 2) 
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However, Rigazio fails to teach the word string class being included in each of 
the plurality of texts analyzed into morphemes 

Deligne teaches that the N-gram class model is defined as a language model 
that approximates a word N-gram in combinations of occurrence distributions of word- 
class N-grams and class-based words as shown by the following equation (this equation 
becomes equivalent to an HMM equation in morphological or morphemic analysis if 
word classes are replaced by parts of speech (Deligne Col. 18 lines 1-16). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate the word string class 
being included in each of the plurality of texts analyzed into morphemes as taught by 
Deligne to allow for a multidimensional probabilistic method of prediction used to 
classify and model speech, wherein analysis can be performed on the smallest text 
units (i.e. morphemes) (Deligne Col. 18 lines 1-16). 

However, Rigazio in view of Deligne fails to teach a sequence made up of the 
virtual word and the other words 

substituting the word string class with a virtual word 

Millett teaches that it is necessary to increase the current 'virtual word number' 
up to the beginning of the next word cluster boundary whenever the end of a data item 
is reached in the word stream. The only way to know this is to place a marker in the 
word stream signaling the end of a data item. For non-word level indexes, granules 
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naturally fall within data items, so there is not a problem with a row in the granule cross 
reference table referring to more than one data item (Millett Col. 1 1 lines 19-29). 

Further, Millett teaches that granule boundary markers 58 are used to demarcate 
the beginning and end of granules (e.g., "<MB>" for the beginning of a granule and 
"<ME>" for the end of a granule 60), as shown in FIG. 2. As used herein, the term 
"granule" and its derivatives refers to a predetermined set of text, or an indexing unit. 
The granule size determines the degree to which the location of a word within a 
document can be determined. For example, a document level granularity would be able 
to identify the document in which a word appears but not the page or paragraph. A 
paragraph level granularity would-be able to more precisely identify the paragraph 
within a document where a word appears, while a word level granularity would be able 
to identify the sequential word location of a word (e.g., the first word of the document, 
the second word of the document, etc.). As the granularity increases and approaches 
word level granularity, the size and complexity of an index increases, but word locations 
can be more precisely defined. The purpose of the word stream 44 is to track the 
granules in which a word occurs, not the total number of occurrences of the word. 
(Millett Col. 4 line 50 - Col. 5 line 15). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio in view of Deligne to incorporate 
substituting the word string class with a virtual word as taught by Millett to allow for the 
identification of varying text (i.e. phrases or words or paragraphs), wherein markers (i.e. 
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virtual words) are used to tag the beginning and end of the specified granule (i.e. 
phrases, words, paragraphs, etc.) (Millett Col. 4 line 50 - Col. 5 line 15). 

Re claims 7, 9, and 22, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 1 , further comprising 

a syntactic tree generation unit operable to perform morphemic analysis as well 
as syntactic analysis of a text (Col. 5 lines 42-63), and generate a syntactic tree in 
which said-the text is structured by a plurality of layers, focusing on a node that is on 
said the syntactic tree (Col. 5 lines 42-63) and that has been selected on the basis of a 
predetermined criterion (Col. 4 lines 4-55 & Fig. 2), 

wherein the higher-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit generates the higher-level N-gram language model for syntactic 
tree, using a first subtree (Col. 5 lines 42-63 & Fig. 4) that constitutes an upper layer 
from the focused node (Col. 4 lines 4-55 & Fig. 2), and 

the lower-level N-gram language model (Col. 6 lines 1 1-20) generation and 
accumulation unit generates the lower-level N-gram language model for syntactic tree, 
using a second subtree (Col. 5 lines 42-63 & Fig. 4) that constitutes a lower layer from 
the focused node (Col. 4 lines 4-55 & Fig. 2) 

However, Rigazio fails to teach morphemic analysis 

Deligne teaches that the N-gram class model is defined as a language model 
that approximates a word N-gram in combinations of occurrence distributions of word- 
class N-grams and class-based words as shown by the following equation (this equation 
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becomes equivalent to an HMM equation in morphological or morphemic analysis if 
word classes are replaced by parts of speech (Deligne Col. 18 lines 1-16). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate morphemic analysis 
as taught by Deligne to allow for a multidimensional probabilistic method of prediction 
used to classify and model speech, wherein analysis can be performed on the smallest 
text units (i.e. morphemes) (Deligne Col. 18 lines 1-16). 

Further, Ragazio fails to teach the higher-level N-gram language model of claim 

1. 

Deligne teaches well known limitations of previous technology, wherein Deligne 
teaches class versions of phrase based models can be defined in a way similar to the 
way class version of N-gram models are defined, i.e., by assigning class labels to the 
phrases. In prior art it consists in first assigning word class labels to the words, and in 
then defining a phrase class label for each distinct phrase of word class labels. A 
drawback of this approach is that only phrases of the same length can be assigned the 
same class label. For example, the phrases "thank you" and "thank you very much" 
cannot be assigned the same class label, because being of different lengths, they will 
lead to different sequences of word class labels (Deligne Col. 2 lines 10-20). 

Further, Deligne improves these limitations by teaching the clustering 
(classification process) of the variable-length phrases is explained. Recently, class- 
phrase based models have gained some attention, but usually like in Prior Art 
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Reference 1 , it assumes a previous clustering of the words. Typically, each word is first 
assigned a word-class label C.sub.k, then variable-length phrases, wherein the phrases 
"thank you for" and "thank you very much for" cannot be assigned the same class label. 
In the present preferred embodiment, it is proposed to address this limitation by directly 
clustering phrases instead of words (Deligne Col. 10 lines 43-60) 

Furthermore, Deligne teaches the step ensures that the class assignment based 
on the mutual information criterion is optimal with respect to the current phrase 
distribution, and the step SS2 ensures that the bigram distribution of the phrases 
optimizes the likelihood calculated according to Equation (19) with the current class 
distribution. The training data are thus iteratively structured at a both paradigmatic and 
syntagmatic level in a fully integrated way (the terms paradigmatic and syntagmatic are 
both linguistic terms). That is, the paradigmatic relations between the phrases 
expressed by the class assignment influence the reestimation of the bigram distribution 
of the phrases, while the bigram distribution of the phrases determines the subsequent 
class assignment (Deligne Col. 11 lines 29-43). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate higher level 
language modeling as taught by Deligne to allow for optimal class assignment to 
account for sentence and word based modeling in speech recognition (Deligne Col. 10 
lines 43-60). 
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Re claim 31, Ragazio teaches the language model generation and accumulation 
apparatus according to claim 1, 

wherein the lower-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit is operable to represent a first sequence of words having a 
common linguistic property (Fig. 1 features) as the word string class, to generate and to 
accumulate, for each word string class, the lower-level N-gram language model that is 
obtained by modeling the first sequence of words included in the word string class (Col. 
4 lines 30-55 & Fig. 2); and 

the lower-level N-gram language model generation and accumulation unit is 
operable to generate and accumulate, for each word string class, the first sequence of 
words having the linguistic property (Fig. 1 features) indicated by the word string class 
(Col. 4 lines 30-55 $ Fig. 2). 

However, Ragazio fails to teach each word included in the first sequence of 
words and each word included in the second sequence of words are respectively 
morphemes which are smallest linguistic units that have meaning 

replace the first sequence of words modeled in the lower-level N-grams language 
model included in a text which is the sequence of words with a word string class 
corresponding to the first sequence of word 

Deligne teaches that the N-gram class model is defined as a language model 
that approximates a word N-gram in combinations of occurrence distributions of word- 
class N-grams and class-based words as shown by the following equation (this equation 
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becomes equivalent to an HMM equation in morphological or morphemic analysis if 
word classes are replaced by parts of speech (Deligne Col. 18 lines 1-16). 

the higher-level N-gram language model generation and accumulation unit is 
operable to replace the first sequence of words modeled in the lower-level N-grams 
language model included in a text which is the sequence of words with a word string 
class corresponding to the first sequence of word, and to generate and to accumulate a 
higher-lever N-gram language model that is obtained by modeling the text which is the 
character string as a sequence of words that includes the word string class and a 
second sequence of words 

Deligne also teaches well known limitations of previous technology, wherein 
Deligne teaches class versions of phrase based models can be defined in a way similar 
to the way class version of N-gram models are defined, i.e., by assigning class labels to 
the phrases. In prior art it consists in first assigning word class labels to the words, and 
in then defining a phrase class label for each distinct phrase of word class labels. A 
drawback of this approach is that only phrases of the same length can be assigned the 
same class label. For example, the phrases "thank you" and "thank you very much" 
cannot be assigned the same class label, because being of different lengths, they will 
lead to different sequences of word class labels (Deligne Col. 2 lines 10-20). 

Further, Deligne improves these limitations by teaching the clustering 
(classification process) of the variable-length phrases is explained. Recently, class- 
phrase based models have gained some attention, but usually like in Prior Art 
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Reference 1 , it assumes a previous clustering of the words. Typically, each word is first 
assigned a word-class label C.sub.k, then variable-length phrases, wherein the phrases 
"thank you for" and "thank you very much for" cannot be assigned the same class label. 
In the present preferred embodiment, it is proposed to address this limitation by directly 
clustering phrases instead of words (Deligne Col. 10 lines 43-60) 

Furthermore, Deligne teaches the step ensures that the class assignment based 
on the mutual information criterion is optimal with respect to the current phrase 
distribution, and the step SS2 ensures that the bigram distribution of the phrases 
optimizes the likelihood calculated according to Equation (19) with the current class 
distribution. The training data are thus iteratively structured at a both paradigmatic and 
syntagmatic level in a fully integrated way (the terms paradigmatic and syntagmatic are 
both linguistic terms). That is, the paradigmatic relations between the phrases 
expressed by the class assignment influence the reestimation of the bigram distribution 
of the phrases, while the bigram distribution of the phrases determines the subsequent 
class assignment (Deligne Col. 11 lines 29-43). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate each word included 
in the first sequence of words and each word included in the second sequence of words 
are respectively morphemes which are smallest linguistic units that have meaning, 
replace the first sequence of words modeled in the lower-level N-grams language model 
included in a text which is the sequence of words with a word string class corresponding 
to the first sequence of word , and the higher-level N-gram language model generation 
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and accumulation unit is operable to replace the first sequence of words modeled in the 
lower-level N-grams language model included in a text which is the sequence of words 
with a word string class corresponding to the first sequence of word, and to generate 
and to accumulate a higher-lever N-gram language model that is obtained by modeling 
the text which is the character string as a sequence of words that includes the word 
string class and a second sequence of words as taught by Deligne to allow for a 
multidimensional probabilistic method of prediction used to classify and model speech, 
wherein analysis can be performed on the smallest text units (i.e. morphemes) (Deligne 
Col. 18 lines 1-16 as well as optimal class assignment to account for sentence and word 
based modeling in speech recognition (Deligne Col. 10 lines 43-60). 

Re claims 32-35, Rigazio teaches the speech recognition apparatus according to 
Claim 14, wherein, in the speech recognized from an input speech, 

an alignment of words is recognized from a input speech, by referring to a 
recognition dictionary which describes pronunciation of the words (Col. 7 line 20 - Col. 
8 line 20), 

a sequence of words including the word string class is assumed in the alignment 
of words (Col. 4 lines 4-55 & Fig. 2), 

However, Rigazio fails to teach the input speech is recognized based on (i) a 
probability that the words including the word string class appear in an order of 
appearance in the assumed sequence of words and (ii) a probability of an appearance 
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of the words or the virtual word denoting the end of the word string class in an order of 
appearance in the word string class 

Deligne teaches that the N-gram class model is defined as a language model 
that approximates a word N-gram in combinations of occurrence distributions of word- 
class N-grams and class-based words as shown by the following equation (this equation 
becomes equivalent to an HMM equation in morphological or morphemic analysis if 
word classes are replaced by parts of speech (Deligne Col. 18 lines 1-16). 

Deligne also teaches well known limitations of previous technology, wherein 
Deligne teaches class versions of phrase based models can be defined in a way similar 
to the way class version of N-gram models are defined, i.e., by assigning class labels to 
the phrases. In prior art it consists in first assigning word class labels to the words, and 
in then defining a phrase class label for each distinct phrase of word class labels. A 
drawback of this approach is that only phrases of the same length can be assigned the 
same class label. For example, the phrases "thank you" and "thank you very much" 
cannot be assigned the same class label, because being of different lengths, they will 
lead to different sequences of word class labels (Deligne Col. 2 lines 10-20). 

Further, Deligne improves these limitations by teaching the clustering 
(classification process) of the variable-length phrases is explained. Recently, class- 
phrase based models have gained some attention, but usually like in Prior Art 
Reference 1 , it assumes a previous clustering of the words. Typically, each word is first 
assigned a word-class label C.sub.k, then variable-length phrases, wherein the phrases 
"thank you for" and "thank you very much for" cannot be assigned the same class label. 
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In the present preferred embodiment, it is proposed to address this limitation by directly 
clustering phrases instead of words (Deligne Col. 10 lines 43-60) 

Furthermore, Deligne teaches the step ensures that the class assignment based 
on the mutual information criterion is optimal with respect to the current phrase 
distribution, and the step SS2 ensures that the bigram distribution of the phrases 
optimizes the likelihood calculated according to Equation (19) with the current class 
distribution. The training data are thus iteratively structured at a both paradigmatic and 
syntagmatic level in a fully integrated way (the terms paradigmatic and syntagmatic are 
both linguistic terms). That is, the paradigmatic relations between the phrases 
expressed by the class assignment influence the reestimation of the bigram distribution 
of the phrases, while the bigram distribution of the phrases determines the subsequent 
class assignment (Deligne Col. 11 lines 29-43). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate the input speech is 
recognized based on (i) a probability that the words including the word string class 
appear in an order of appearance in the assumed sequence of words as taught by 
Deligne to allow for optimal class assignment to account for sentence and word based 
modeling in speech recognition (Deligne Col. 10 lines 43-60). 

However, Deligne in view of Rigazio fails to teach the virtual word denoting the 
end of the word string class in an order of appearance in the word string class 
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Millett teaches that it is necessary to increase the current 'virtual word number' 
up to the beginning of the next word cluster boundary whenever the end of a data item 
is reached in the word stream. The only way to know this is to place a marker in the 
word stream signaling the end of a data item. For non-word level indexes, granules 
naturally fall within data items, so there is not a problem with a row in the granule cross 
reference table referring to more than one data item (Millett Col. 1 1 lines 19-29). 

Further, Millett teaches that granule boundary markers 58 are used to demarcate 
the beginning and end of granules (e.g., "<MB>" for the beginning of a granule and 
"<ME>" for the end of a granule 60), as shown in FIG. 2. As used herein, the term 
"granule" and its derivatives refers to a predetermined set of text, or an indexing unit. 
The granule size determines the degree to which the location of a word within a 
document can be determined. For example, a document level granularity would be able 
to identify the document in which a word appears but not the page or paragraph. A 
paragraph level granularity would-be able to more precisely identify the paragraph 
within a document where a word appears, while a word level granularity would be able 
to identify the sequential word location of a word (e.g., the first word of the document, 
the second word of the document, etc.). As the granularity increases and approaches 
word level granularity, the size and complexity of an index increases, but word locations 
can be more precisely defined. The purpose of the word stream 44 is to track the 
granules in which a word occurs, not the total number of occurrences of the word. 
(Millett Col. 4 line 50 - Col. 5 line 15). 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio in view of Deligne to incorporate 
the virtual word denoting the end of the word string class in an order of appearance in 
the word string class as taught by Millett to allow for the identification of varying text (i.e. 
phrases or words or paragraphs), wherein markers (i.e. virtual words) are used to tag 
the beginning and end of the specified granule (i.e. phrases, words, paragraphs, etc.) 
(Millett Col. 4 line 50 - Col. 5 line 15). 

6. Claims 6, 8, 10-12, 19-21, and 23-25 rejected under 35 U.S.C. 103(a) as being 
unpatentable over Rigazio et al. US 6182039 B1 (hereinafter Rigazio) in view of 
Deligne et al. US 6314399 B1 (hereinafter Deligne) and Millett et al. US 6584458 B1 
(hereinafter Millett) and further in view of Hwang et al. US 20020082831 A1 
(hereinafter Hwang). 

Re claims 6 and 19, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 1, 

wherein the lower-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit includes an exception word judgment unit operable to judge 
whether or not a specific word out of a plurality of words that appear in the word string 
class should be treated as an exception word (Col. 4 lines 4-55 & Fig. 2), based on a 
linguistic property of the specific word, and divides the exception word into (i) a syllable 
that is a basic phonetic unit constituting a pronunciation of the exception word (Col. 4 
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lines 4-55 & Fig. 2) and (ii) a unit that is obtained by combining syllables based on a 
judgment result the exception word being (Col. 4 lines 4-55 & Fig. 2), 

the language model generation and accumulation apparatus further comprises a 
class dependent syllable N-gram generation and accumulation unit operable to 
generate class dependent syllable N-grams by modeling a sequence made up of the 
syllable and the unit obtained by combining syllables and by providing a language 
likelihood (Col. 1 lines 31-39) to the sequence in dependency on either the word string 
class or the linguistic property of the exception word (Col. 4 lines 4-55 & Fig. 2), 

However, Ragazio fails to teach a higher-level N-gram language model 

the language likelihood being a logarithm value of a probability. 

Deligne teaches well known limitations of previous technology, wherein Deligne 
teaches class versions of phrase based models can be defined in a way similar to the 
way class version of N-gram models are defined, i.e., by assigning class labels to the 
phrases. In prior art it consists in first assigning word class labels to the words, and in 
then defining a phrase class label for each distinct phrase of word class labels. A 
drawback of this approach is that only phrases of the same length can be assigned the 
same class label. For example, the phrases "thank you" and "thank you very much" 
cannot be assigned the same class label, because being of different lengths, they will 
lead to different sequences of word class labels (Deligne Col. 2 lines 10-20). 

Further, Deligne improves these limitations by teaching the clustering 
(classification process) of the variable-length phrases is explained. Recently, class- 
phrase based models have gained some attention, but usually like in Prior Art 
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Reference 1 , it assumes a previous clustering of the words. Typically, each word is first 
assigned a word-class label C.sub.k, then variable-length phrases, wherein the phrases 
"thank you for" and "thank you very much for" cannot be assigned the same class label. 
In the present preferred embodiment, it is proposed to address this limitation by directly 
clustering phrases instead of words (Deligne Col. 10 lines 43-60) 

Furthermore, Deligne teaches the step ensures that the class assignment based 
on the mutual information criterion is optimal with respect to the current phrase 
distribution, and the step SS2 ensures that the bigram distribution of the phrases 
optimizes the likelihood calculated according to Equation (19) with the current class 
distribution. The training data are thus iteratively structured at a both paradigmatic and 
syntagmatic level in a fully integrated way (the terms paradigmatic and syntagmatic are 
both linguistic terms). That is, the paradigmatic relations between the phrases 
expressed by the class assignment influence the reestimation of the bigram distribution 
of the phrases, while the bigram distribution of the phrases determines the subsequent 
class assignment (Deligne Col. 11 lines 29-43). 

Additionally, Deligne teaches the use of a logarithmic probability in relation to n- 
gram word classification (Deligne Col. 18 lines 25-40) 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate a higher-level N- 
gram language model and a language likelihood being a logarithm value of a probability 
as taught by Deligne to allow for a well known probabilistic method of prediction used to 
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classify and model speech, wherein analysis can be performed on the smallest text 
units (i.e. morphemes) (Deligne Col. 18 lines 1-16). 

However, Rigazio in view of Deligne and Millett fail to teach a word not being 
included as a constituent word of the word string class accumulate the generated class 
dependent syllable N-grams 

Hwang teaches n-gram analysis of text as well as syllables (well known in the art 
to be non-morphemic, non-word, non-sentence, etc.), wherein Hwang teaches that each 
syllable-like unit is found in SLU language model 512, which in many embodiments is a 
trigram language model. Under one embodiment, each syllable-like unit in language 
model 512 is named such that the name describes all of the phonetic units that make up 
the syllable-like unit. Using this naming strategy, SLU engine 510 is able to identify the 
phonetic units associated with each syllable-like unit simply by examining the name 
associated with the syllable-like unit. For example, the syllable-like unit named 
EH_K_S, which is the first syllable in the word "exclamation", contains the phonemes 
EH, K and S (Hwang [0064]). 

Further, Hwang teaches SLU engine 510 updates the score for a hypothesized 
sequence of syllable-like units by adding the language model score and acoustic model 
score of the next syllable-like unit to the sequence score. SLU engine 510 calculates 
the language model score based on the model score stored in SLU language model 512 
for the next syllable-like unit to be added to the hypothesized sequence. In one 
embodiment, SLU language model 512 is a trigram model, and the model score is 



Application/Control Number: 10/520,922 Page 28 

Art Unit: 2626 

based on the next syllable-like unit and the last two syllable-like units in the sequence of 
units (Hwang [0066]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio in view of Deligne and Millett to 
incorporate a word not being included as a constituent word of the word string class 
accumulate the generated class dependent syllable N-grams as taught by Hwang to 
allow for the proper identification of non-textual units such as syllables, wherein 
modeling can be phonetically implemented after progressing from paragraph to 
morpheme to syllable to find the combination/sequence of syllable that form an overall 
textual element located within text (Hwang [0064]). 

Re claims 8, 10, and 23, Rigazio teaches the language model (Col. 6 lines 11-20) 
generation and accumulation apparatus according to Claim 7, 

wherein the lower-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit includes 

a language model generation exception word judgment unit operable to judge a 
specific word appearing in the second subtree (Col. 5 lines 42-63)as an exception word 
based on a predetermined linguistic property (Col. 4 lines 30-55 $ Fig. 2), the exception 
word being a word not being included as a constituent word of any subtree (Col. 4 lines 
30-55 $ Fig. 2), 

the lower-level N-gram language model generation and accumulation unit 
generates the lower-level N-gram language model (Col. 4 lines 30-55 $ Fig. 2)by 
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dividing the exception word into (i) a syllable that is a basic phonetic unit constituting a 
pronunciation of the word (Col. 4 lines 30-55 $ Fig. 2) and (ii) a unit that is obtained by 
combining syllables, and then by modeling a sequence made up of the syllable and the 
unit obtained by combining syllables in dependency on a location of the exception word 
in the syntactic tree (Col. 5 lines 42-63) and on the linguistic property of the exception 
word (Col. 4 lines 30-55 $ Fig. 2) 

However, Rigazio in view of Deligne and Millett fail to teach a word not being 
included as a constituent word of the word string class accumulate the generated class 
dependent syllable N-grams 

Hwang teaches n-gram analysis of text as well as syllables (well known in the art 
to be non-morphemic, non-word, non-sentence, etc.), wherein Hwang teaches that each 
syllable-like unit is found in SLU language model 512, which in many embodiments is a 
trigram language model. Under one embodiment, each syllable-like unit in language 
model 512 is named such that the name describes all of the phonetic units that make up 
the syllable-like unit. Using this naming strategy, SLU engine 510 is able to identify the 
phonetic units associated with each syllable-like unit simply by examining the name 
associated with the syllable-like unit. For example, the syllable-like unit named 
EH_K_S, which is the first syllable in the word "exclamation", contains the phonemes 
EH, K and S (Hwang [0064]). 

Further, Hwang teaches SLU engine 510 updates the score for a hypothesized 
sequence of syllable-like units by adding the language model score and acoustic model 
score of the next syllable-like unit to the sequence score. SLU engine 510 calculates 



Application/Control Number: 10/520,922 Page 30 

Art Unit: 2626 

the language model score based on the model score stored in SLU language model 512 
for the next syllable-like unit to be added to the hypothesized sequence. In one 
embodiment, SLU language model 512 is a trigram model, and the model score is 
based on the next syllable-like unit and the last two syllable-like units in the sequence of 
units (Hwang [0066]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio in view of Deligne and Millett to 
incorporate dividing the exception word into (i) a syllable that is a basic phonetic unit 
constituting a pronunciation of the word and (ii) a unit that is obtained by combining 
syllables, and then by modeling a sequence made up of the syllable and the unit 
obtained by combining syllables in dependency on a location of the exception word as 
taught by Hwang to allow for the proper identification of non-textual units such as 
syllables, wherein modeling can be phonetically implemented after progressing from 
paragraph to morpheme to syllable to find the combination/sequence of syllable that 
form an overall textual element located within text (Hwang [0064]). 

Re claims 1 1 and 12, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 1, 

wherein the higher-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit generates the higher-level N-gram language model in which each 
(Col. 4 lines 30-55 $ Fig. 2) 
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However, Rigazio fails to teach a sequence of N words including the word string 
class is associated a probability at which said each sequence of N words 

analyzing the first sequence of words within the word string class into one or 
more morphemes that are the smallest language units having meanings. 

Deligne teaches well known limitations of previous technology, wherein Deligne 
teaches class versions of phrase based models can be defined in a way similar to the 
way class version of N-gram models are defined, i.e., by assigning class labels to the 
phrases. In prior art it consists in first assigning word class labels to the words, and in 
then defining a phrase class label for each distinct phrase of word class labels. A 
drawback of this approach is that only phrases of the same length can be assigned the 
same class label. For example, the phrases "thank you" and "thank you very much" 
cannot be assigned the same class label, because being of different lengths, they will 
lead to different sequences of word class labels (Deligne Col. 2 lines 10-20). 

Further, Deligne improves these limitations by teaching the clustering 
(classification process) of the variable-length phrases is explained. Recently, class- 
phrase based models have gained some attention, but usually like in Prior Art 
Reference 1 , it assumes a previous clustering of the words. Typically, each word is first 
assigned a word-class label C.sub.k, then variable-length phrases, wherein the phrases 
"thank you for" and "thank you very much for" cannot be assigned the same class label. 
In the present preferred embodiment, it is proposed to address this limitation by directly 
clustering phrases instead of words (Deligne Col. 10 lines 43-60) 
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Furthermore, Deligne teaches the step ensures that the class assignment based 
on the mutual information criterion is optimal with respect to the current phrase 
distribution, and the step SS2 ensures that the bigram distribution of the phrases 
optimizes the likelihood calculated according to Equation (19) with the current class 
distribution. The training data are thus iteratively structured at a both paradigmatic and 
syntagmatic level in a fully integrated way (the terms paradigmatic and syntagmatic are 
both linguistic terms). That is, the paradigmatic relations between the phrases 
expressed by the class assignment influence the reestimation of the bigram distribution 
of the phrases, while the bigram distribution of the phrases determines the subsequent 
class assignment (Deligne Col. 11 lines 29-43). 

Additionally, Deligne teaches the use of a logarithmic probability in relation to n- 
gram word classification (Deligne Col. 18 lines 25-40) 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate a sequence of N 
words including the word string class is associated a probability at which said each 
sequence of N words analyzing the first sequence of words within the word string class 
into one or more morphemes that are the smallest language units having meanings as 
taught by Deligne to allow for a well known probabilistic method of prediction used to 
classify and model speech, wherein analysis can be performed on the smallest text 
units (i.e. morphemes) (Deligne Col. 18 lines 1-16). 
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Re claim 20, Rigazio teaches the language model generation and accumulation 
apparatus according to Claim 19, further comprising 

a syntactic tree generation unit operable to perform morphemic analysis as well 
as syntactic analysis of a text (Col. 5 lines 42-63), and generate a syntactic tree in 
which said-the text is structured by a plurality of layers, focusing on a node that is on 
said the syntactic tree (Col. 5 lines 42-63) and that has been selected on the basis of a 
predetermined criterion (Col. 4 lines 4-55 & Fig. 2), 

wherein the higher-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit generates the higher-level N-gram language model for syntactic 
tree, using a first subtree (Col. 5 lines 42-63 & Fig. 4) that constitutes an upper layer 
from the focused node (Col. 4 lines 4-55 & Fig. 2), and 

the lower-level N-gram language model (Col. 6 lines 1 1-20) generation and 
accumulation unit generates the lower-level N-gram language model for syntactic tree, 
using a second subtree (Col. 5 lines 42-63 & Fig. 4) that constitutes a lower layer from 
the focused node (Col. 4 lines 4-55 & Fig. 2) 

the speech recognition apparatus comprises: 

an acoustic processing unit operable to generate feature parameters from the 
speech (Col. 4 lines 30-55 $ Fig. 2); 

a word comparison unit operable to compare a pronunciation of each word with 
each of the feature parameters (Col. 4 lines 30-55 $ Fig. 2), and generate a set of word 
hypotheses including an utterance segment of each word and an acoustic likelihood of 
each word (Col. 1 lines 31-39); 
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a word string hypothesis (Col. 12 lines 23-41) generation unit operable to 
generate a word string hypothesis from the set of word hypotheses with reference to the 
higher-level N-gram language model for syntactic tree (Col. 5 lines 42-63) and the 
lower-level N-gram language model for syntactic tree (Col. 5 lines 42-63), and generate 
a result of the speech recognition 

However, Rigazio fails to teach a higher level n-gram modeling and morphemic 
analysis 

Deligne teaches that the N-gram class model is defined as a language model 
that approximates a word N-gram in combinations of occurrence distributions of word- 
class N-grams and class-based words as shown by the following equation (this equation 
becomes equivalent to an HMM equation in morphological or morphemic analysis if 
word classes are replaced by parts of speech (Deligne Col. 18 lines 1-16). 

Deligne also teaches well known limitations of previous technology, wherein 
Deligne teaches class versions of phrase based models can be defined in a way similar 
to the way class version of N-gram models are defined, i.e., by assigning class labels to 
the phrases. In prior art it consists in first assigning word class labels to the words, and 
in then defining a phrase class label for each distinct phrase of word class labels. A 
drawback of this approach is that only phrases of the same length can be assigned the 
same class label. For example, the phrases "thank you" and "thank you very much" 
cannot be assigned the same class label, because being of different lengths, they will 
lead to different sequences of word class labels (Deligne Col. 2 lines 10-20). 
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Further, Deligne improves these limitations by teaching the clustering 
(classification process) of the variable-length phrases is explained. Recently, class- 
phrase based models have gained some attention, but usually like in Prior Art 
Reference 1 , it assumes a previous clustering of the words. Typically, each word is first 
assigned a word-class label C.sub.k, then variable-length phrases, wherein the phrases 
"thank you for" and "thank you very much for" cannot be assigned the same class label. 
In the present preferred embodiment, it is proposed to address this limitation by directly 
clustering phrases instead of words (Deligne Col. 10 lines 43-60) 

Furthermore, Deligne teaches the step ensures that the class assignment based 
on the mutual information criterion is optimal with respect to the current phrase 
distribution, and the step SS2 ensures that the bigram distribution of the phrases 
optimizes the likelihood calculated according to Equation (19) with the current class 
distribution. The training data are thus iteratively structured at a both paradigmatic and 
syntagmatic level in a fully integrated way (the terms paradigmatic and syntagmatic are 
both linguistic terms). That is, the paradigmatic relations between the phrases 
expressed by the class assignment influence the reestimation of the bigram distribution 
of the phrases, while the bigram distribution of the phrases determines the subsequent 
class assignment (Deligne Col. 11 lines 29-43). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate each word included 
in the first sequence of words and each word included in the second sequence of words 
are respectively morphemes which are smallest linguistic units that have meaning, 
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replace the first sequence of words modeled in the lower-level N-grams language model 
included in a text which is the sequence of words with a word string class corresponding 
to the first sequence of word , and the higher-level N-gram language model generation 
and accumulation unit is operable to replace the first sequence of words modeled in the 
lower-level N-grams language model included in a text which is the sequence of words 
with a word string class corresponding to the first sequence of word, and to generate 
and to accumulate a higher-lever N-gram language model that is obtained by modeling 
the text which is the character string as a sequence of words that includes the word 
string class and a second sequence of words as taught by Deligne to allow for a 
multidimensional probabilistic method of prediction used to classify and model speech, 
wherein analysis can be performed on the smallest text units (i.e. morphemes) (Deligne 
Col. 18 lines 1-16 as well as optimal class assignment to account for sentence and word 
based modeling in speech recognition (Deligne Col. 10 lines 43-60). 

Re claim 21 , Rigazio teaches the apparatus according to Claim 20, 
wherein the lower-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit includes 

a language model generation exception word judgment unit operable to judge a 
specific word appearing in the second subtree (Col. 5 lines 42-63)as an exception word 
based on a predetermined linguistic property (Col. 4 lines 30-55 $ Fig. 2), the exception 
word being a word not being included as a constituent word of any subtree (Col. 4 lines 
30-55 $ Fig. 2), 
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the lower-level N-gram language model generation and accumulation unit 
generates the lower-level N-gram language model (Col. 4 lines 30-55 $ Fig. 2)by 
dividing the exception word into (i) a syllable that is a basic phonetic unit constituting a 
pronunciation of the word (Col. 4 lines 30-55 $ Fig. 2) and (ii) a unit that is obtained by 
combining syllables, and then by modeling a sequence made up of the syllable and the 
unit obtained by combining syllables in dependency on a location of the exception word 
in the syntactic tree (Col. 5 lines 42-63) and on the linguistic property of the exception 
word (Col. 4 lines 30-55 $ Fig. 2) 

the word string hypothesis generation unit generates the result of the speech 
recognition (Col. 12 lines 23-41). 

However, Rigazio in view of Deligne and Millett fail to teach a word not being 
included as a constituent word of the word string class accumulate the generated class 
dependent syllable N-grams 

Hwang teaches n-gram analysis of text as well as syllables (well known in the art 
to be non-morphemic, non-word, non-sentence, etc.), wherein Hwang teaches that each 
syllable-like unit is found in SLU language model 512, which in many embodiments is a 
trigram language model. Under one embodiment, each syllable-like unit in language 
model 512 is named such that the name describes all of the phonetic units that make up 
the syllable-like unit. Using this naming strategy, SLU engine 510 is able to identify the 
phonetic units associated with each syllable-like unit simply by examining the name 
associated with the syllable-like unit. For example, the syllable-like unit named 
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EH_K_S, which is the first syllable in the word "exclamation", contains the phonemes 
EH, K and S (Hwang [0064]). 

Further, Hwang teaches SLU engine 510 updates the score for a hypothesized 
sequence of syllable-like units by adding the language model score and acoustic model 
score of the next syllable-like unit to the sequence score. SLU engine 510 calculates 
the language model score based on the model score stored in SLU language model 512 
for the next syllable-like unit to be added to the hypothesized sequence. In one 
embodiment, SLU language model 512 is a trigram model, and the model score is 
based on the next syllable-like unit and the last two syllable-like units in the sequence of 
units (Hwang [0066]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio in view of Deligne and Millett to 
incorporate a word not being included as a constituent word of the word string class 
accumulate the generated class dependent syllable N-grams as taught by Hwang to 
allow for the proper identification of non-textual units such as syllables, wherein 
modeling can be phonetically implemented after progressing from paragraph to 
morpheme to syllable to find the combination/sequence of syllable that form an overall 
textual element located within text (Hwang [0064]). 

Re claims 24 and 25, Rigazio teaches the speech recognition apparatus 
according to Claim 14, 
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wherein the higher-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit generates the higher-level N-gram language model in which each 
sequence of N words (Col. 4 lines 30-55 $ Fig. 2) 

the speech recognition apparatus comprises 

a word string hypothesis generation unit operable to evaluate a word string 
hypothesis (Col. 12 lines 23-41). 

However, Ragazio fails to teach a higher-level N-gram language model 

a word string class is associated with a probability at which the each sequence of 

words 

multiplying each probability at which the each sequence of N words including the 
word string class occurs 

Deligne teaches well known limitations of previous technology, wherein Deligne 
teaches class versions of phrase based models can be defined in a way similar to the 
way class version of N-gram models are defined, i.e., by assigning class labels to the 
phrases. In prior art it consists in first assigning word class labels to the words, and in 
then defining a phrase class label for each distinct phrase of word class labels. A 
drawback of this approach is that only phrases of the same length can be assigned the 
same class label. For example, the phrases "thank you" and "thank you very much" 
cannot be assigned the same class label, because being of different lengths, they will 
lead to different sequences of word class labels (Deligne Col. 2 lines 10-20). 

Further, Deligne improves these limitations by teaching the clustering 
(classification process) of the variable-length phrases is explained. Recently, class- 
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phrase based models have gained some attention, but usually like in Prior Art 
Reference 1 , it assumes a previous clustering of the words. Typically, each word is first 
assigned a word-class label C.sub.k, then variable-length phrases, wherein the phrases 
"thank you for" and "thank you very much for" cannot be assigned the same class label. 
In the present preferred embodiment, it is proposed to address this limitation by directly 
clustering phrases instead of words (Deligne Col. 10 lines 43-60) 

Furthermore, Deligne teaches the step ensures that the class assignment based 
on the mutual information criterion is optimal with respect to the current phrase 
distribution, and the step SS2 ensures that the bigram distribution of the phrases 
optimizes the likelihood calculated according to Equation (19) with the current class 
distribution. The training data are thus iteratively structured at a both paradigmatic and 
syntagmatic level in a fully integrated way (the terms paradigmatic and syntagmatic are 
both linguistic terms). That is, the paradigmatic relations between the phrases 
expressed by the class assignment influence the reestimation of the bigram distribution 
of the phrases, while the bigram distribution of the phrases determines the subsequent 
class assignment (Deligne Col. 11 lines 29-43). 

Additionally, Deligne teaches the use of a logarithmic probability in relation to n- 
gram word classification (Deligne Col. 18 lines 25-40) 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Rigazio to incorporate a higher-level N- 
gram language model, a word string class is associated with a probability at which the 
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each sequence of words, multiplying each probability at which the each sequence of N 
words including the word string class occurs as taught by Deligne to allow for optimal 
probabilistic class assignment to account for sentence and word based modeling in 
speech recognition (Deligne Col. 10 lines 43-60). 



Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Michael C. Colucci whose telephone number is (571)- 
270-1847. The examiner can normally be reached on 9:30 am - 6:00 pm, Monday- 
Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571)-272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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